Skip to content

Add optional schema enforcement for KG builder as a validation layer after entity and relation extraction #296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 12, 2025

Conversation

NathalieCharbel
Copy link
Contributor

Description

This PR adds an optional schema enforcement layer to validate the extracted entities, relations, and properties against the schema that is passed to the LLM as guidance for the extraction. This change introduces:

  • A new _clean_graph function in the LLMEntityRelationExtractor class that filters out invalid nodes, relationships, and properties based on a provided SchemaConfig object:
  • Support for different enforcement modes (e.g., NONE vs. STRICT). So far implemented using Enum.
  • Unit tests covering various enforcement scenarios (no schema, invalid nodes, invalid relations, etc.).

The cleanup is performed at the chunk graphs level after the extraction and before post-processing, i.e.,before relationships to the chunks are created.

Cleanup logic:

  • Remove invalid nodes (not conformant to the schema)
  • Remove invalid node properties (not conformant to the schema)
  • Remove nodes that are left with no properties
  • Remove invalid relations (whose types or start/end nodes are not conformant to the schema)
  • Remove relations whose start/end node was removed
  • Remove invalid relations properties (not conformant to the schema).

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Documentation update
  • Project configuration change

Complexity

Note

Please provide an estimated complexity of this PR of either Low, Medium or High

Complexity:

How Has This Been Tested?

  • Unit tests
  • E2E tests
  • Manual tests

Checklist

The following requirements should have been met (depending on the changes in the branch):

  • Documentation has been updated
  • Unit tests have been updated
  • E2E tests have been updated
  • Examples have been updated
  • New files have copyright header
  • CLA (https://neo4j.com/developer/cla/) has been signed
  • CHANGELOG.md updated if appropriate

@NathalieCharbel NathalieCharbel requested a review from a team as a code owner March 6, 2025 12:00
@NathalieCharbel NathalieCharbel changed the title Add optional strict mode for KG builder as a validation layer after entity and relation extraction Add optional schema enforcement for KG builder as a validation layer after entity and relation extraction Mar 6, 2025
Copy link
Contributor

@stellasia stellasia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌟

@NathalieCharbel NathalieCharbel force-pushed the kg-builder-strict-mode branch from b19de41 to 1dc0c42 Compare March 12, 2025 11:31
@NathalieCharbel NathalieCharbel force-pushed the kg-builder-strict-mode branch from 1dc0c42 to cf9fe86 Compare March 12, 2025 11:55
@NathalieCharbel NathalieCharbel merged commit 5b868aa into neo4j:main Mar 12, 2025
3 of 7 checks passed
@NathalieCharbel NathalieCharbel deleted the kg-builder-strict-mode branch March 12, 2025 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants